-
Notifications
You must be signed in to change notification settings - Fork 22
Sq8 to Sq8 dist functions - ip and cosine [MOD-13170] #873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implemented inner product and cosine distance functions for SQ8-to-SQ8 vectors in SVE, NEON, and AVX512 architectures. - Added corresponding distance function selection logic in IP_space.cpp and function headers in IP_space.h. - Created benchmarks for SQ8-to-SQ8 distance functions to evaluate performance across different architectures. - Developed unit tests to validate the correctness of the new distance functions against expected results. - Ensured compatibility with existing optimization features for various CPU architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for SQ8-to-SQ8 distance functions for Inner Product and Cosine similarity metrics, where both vectors are scalar-quantized 8-bit representations. This extends the existing SQ8 functionality which only handled float-to-SQ8 comparisons.
Key changes:
- Implemented optimized SIMD versions for Intel (AVX512) and ARM (SVE, SVE2, NEON) architectures
- Added comprehensive test coverage for all optimization variants
- Integrated new benchmark suite for performance testing
- Refactored existing helper function names to avoid naming conflicts
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_spaces.cpp | Added unit tests for SQ8_SQ8 IP and Cosine functions across all architectures; improved test suite naming consistency |
| tests/benchmark/spaces_benchmarks/bm_spaces_sq8_sq8.cpp | New benchmark file for SQ8_SQ8 distance functions |
| tests/benchmark/spaces_benchmarks/bm_spaces_sq8.cpp | Removed empty comment lines (cleanup) |
| tests/benchmark/benchmarks.sh | Added spaces_sq8_sq8 to benchmark configurations |
| tests/benchmark/CMakeLists.txt | Added sq8_sq8 data type to build system |
| src/VecSim/spaces/functions/SVE2.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/SVE2.cpp | Implemented SVE2 function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/functions/SVE.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/SVE.cpp | Implemented SVE function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/functions/NEON.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/NEON.cpp | Implemented NEON function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.h | Added function declarations for SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.cpp | Implemented AVX512 function choosers for SQ8_SQ8 distance functions |
| src/VecSim/spaces/IP_space.h | Added public API declarations for SQ8_SQ8 distance functions |
| src/VecSim/spaces/IP_space.cpp | Implemented dispatcher functions that select appropriate SIMD implementation |
| src/VecSim/spaces/IP/IP_SVE_SQ8_SQ8.h | New file with SVE-optimized SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/IP/IP_NEON_SQ8_SQ8.h | New file with NEON-optimized SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/IP/IP_AVX512F_SQ8_SQ8_BW_VL_VNNI.h | New file with AVX512-optimized SQ8_SQ8 IP and Cosine implementations |
| src/VecSim/spaces/IP/IP_AVX512F_SQ8_BW_VL_VNNI.h | Renamed helper function to avoid naming conflicts with new SQ8_SQ8 implementations |
| src/VecSim/spaces/IP/IP_AVX2_SQ8.h | Renamed helper function to avoid naming conflicts and fixed cosine normalization |
| src/VecSim/spaces/IP/IP.h | Added function declarations for SQ8_SQ8 base implementations |
| src/VecSim/spaces/IP/IP.cpp | Implemented baseline SQ8_SQ8 InnerProduct and Cosine functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #873 +/- ##
==========================================
- Coverage 97.03% 97.02% -0.01%
==========================================
Files 126 127 +1
Lines 7455 7506 +51
==========================================
+ Hits 7234 7283 +49
- Misses 221 223 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… SQ8-to-SQ8 calculations
… NEON and AVX512 headers
…ocumentation accordingly
…tance assertion tolerance
… using AVX512 VNNI; add benchmarks and tests for new functionality
…VE, and AVX512; add corresponding selection functions and update tests for consistency.
…update benchmarks and tests for new functionality
…stance function tests
…itional compilation in tests
meiravgri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Please revert changes in files that are not related to this pr (all IP_*_SQ8.h files changes)
-
consider comparing IP_*UINT8.h optimized implmntation function to the new SQ8_SQ8 to ensure we get the best optimization
tests/utils/tests_utils.h
Outdated
| const auto *pVect1 = static_cast<const uint8_t *>(pVect1v); | ||
| const auto *pVect2 = static_cast<const uint8_t *>(pVect2v); | ||
|
|
||
| // Extract metadata from the end of vectors (likely already prefetched) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does it mean "(likely already prefetched)"?
tests/unit/test_spaces.cpp
Outdated
| float baseline = SQ8_Cosine(v1_orig.data(), v2_quantized.data(), dim); | ||
|
|
||
| #ifdef OPT_SVE2 | ||
| if (optimization.sve2) { | ||
| unsigned char alignment = 0; | ||
| arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization); | ||
| ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_SVE2(dim)) | ||
| << "Unexpected distance function chosen for dim " << dim; | ||
| ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01) | ||
| << "SVE2 with dim " << dim; | ||
| optimization.sve2 = 0; | ||
| } | ||
| #endif | ||
| #ifdef OPT_SVE | ||
| if (optimization.sve) { | ||
| unsigned char alignment = 0; | ||
| arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization); | ||
| ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_SVE(dim)) | ||
| << "Unexpected distance function chosen for dim " << dim; | ||
| ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01) | ||
| << "SVE with dim " << dim; | ||
| optimization.sve = 0; | ||
| } | ||
| #endif | ||
| #ifdef OPT_NEON | ||
| if (optimization.asimd) { | ||
| unsigned char alignment = 0; | ||
| arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization); | ||
| ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_NEON(dim)) | ||
| << "Unexpected distance function chosen for dim " << dim; | ||
| ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01) | ||
| << "NEON with dim " << dim; | ||
| optimization.asimd = 0; | ||
| } | ||
| #endif | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid this (and all the following) change that is not related to sq8_sq8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR should also handle the fixes needed with regular SQ8, as it was also converted to be used by the disk implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to implement them for sure, but in a different pr
…Calculations - Refactored inner product calculations for SQ8 vectors using NEON and SVE optimizations. - Integrated UINT8_InnerProductImp for efficient dot product computation in NEON and SVE implementations. - Updated inner product functions to handle 64-element chunks for improved performance. - Adjusted distance function selection logic to ensure optimizations are applied only for dimensions >= 16. - Added tests for zero vectors and constant vectors to validate optimized implementations against baseline results. - Ensured consistency in assertions for symmetry tests across various optimization flags. - Improved code readability and maintainability by removing redundant code and comments.
- Updated inner product functions for NEON, SSE4, and SVE to streamline dequantization and reduce unnecessary calculations. - Consolidated common logic for inner product and cosine calculations across different SIMD implementations. - Enhanced the handling of vector normalization and quantization in unit tests, ensuring consistency in compressed vector sizes. - Adjusted benchmark tests to reflect changes in vector compression and distance function calls. - Corrected include paths for AVX512 implementations to maintain consistency across the codebase.
… compressed size calculations
| float SQ8_SQ8_InnerProductImp(const void *pVec1v, const void *pVec2v, size_t dimension) { | ||
| // Compute raw dot product using efficient UINT8 AVX512 VNNI implementation | ||
| // UINT8_InnerProductImp uses _mm512_dpwssd_epi32 for native integer dot product | ||
| int dot_product = UINT8_InnerProductImp<residual>(pVec1v, pVec2v, dimension); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| // NEON SQ8-to-SQ8 functions | ||
| #ifdef OPT_NEON | ||
| bool neon_supported = opt.asimd; | ||
| INITIALIZE_BENCHMARKS_SET_IP(BM_VecSimSpaces_SQ8_SQ8, SQ8_SQ8, NEON, 16, neon_supported); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that the dim_opt is not aligned with bm_spaces_uint.cpp values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just forgot to change it.
Describe the changes in the pull request
Added support to SQ8 to SQ8 Ip and Cosine spaces.
Intel:
AVX512_F_BW_VL_VNNI - needs it to support uint8 and in8 operations.
ARM:
SVE
NEON
The cosine functions assume that the vectors are normlized therefore don't divide by the norm.
Which issues this PR fixes
Main objects this PR modified
Mark if applicable